Filtering queried data on data stores

ABSTRACT

A data set may be distributed over many data stores, and a query may be distributively evaluated by several data stores with the results combined to form a query result (e.g., utilizing a MapReduce framework). However, such architectures may violate security principles by performing sophisticated processing, including the execution of arbitrary code, on the same machines that store the data. Instead of processing queries, a data store may be configured only to receive requests specifying one or more filtering criteria, and to provide the data items satisfying the filtering criteria. A compute node may apply a query by generating a request including one o more filter criteria, providing the request to a data node, and applying the remainder of the query (including sophisticated processing, and potentially the execution of arbitrary code) to the data items provided by the data node, thereby improving the security and efficiency of query processing.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.12/979,467, filed Dec. 28, 2010, the entire contents of which is herebyincorporated herein by reference for all purposes.

BACKGROUND

Within the field of computing, many scenarios involve a query to beapplied to a data set stored by one or more data stores. For example, auser or a data-driven process may request a particular subset of data byrequesting from the data store a query specified in a query language,such as the Structured Query Language (SQL). The data store may receivethe query, process it using a query processing engine (e.g., a softwarepipeline comprising components that perform various parsing operationson the query, such as associating names in the query with the namedobjects of the database and identifying the operations specified byvarious operators), apply the operations specified by the parsed queryto the stored data, and return the query result that has been specifiedby the query. The query result may comprise a set of records specifiedby the query, a set of attributes of such records, or a resultcalculated from the data (e.g., a count of records matching certainquery criteria). The result may also comprise a report of an actiontaken with respect to the stored data, such as a creation ormodification of a table or an insertion, update, or deletion of recordsin a table.

In many such scenarios, the database may be distributed over several,and potentially a large number of, data stores. For example, in adistributed database, different portions of the stored data may bestored in one or more data stores in a server farm. When a query isreceived to be applied to the data set, a machine receiving the querymay identify which data stores are likely to contain the data targetedby the query, and may send the query to one or more of those datastores. Each such data store may apply the query to the data storedtherein, and may send back a query result. If the query was applied bytwo or more data stores, the query results may be combined to generatean aggregated query result. In some scenarios, one machine maycoordinate the process of distributing the query to the involved datastores and aggregating the query results. Techniques such as theMapReduce framework have been devised to achieve such distribution andaggregation in an efficient manner.

The data engines utilized by such data stores may be quitesophisticated, and may be capable of applying many complicatedcomputational processes to such data stores, such as databasetransactions, journaling, the execution of stored procedures, and theacceptance and execution of agents. The query language itself maypromote the complexity of queries to be handled by the data store,including nesting, computationally intensive similarity comparisons ofstrings and other data types, and modifications to the structure of thedatabase. Additionally, the logical processes applied by the queryprocessing engine of a data store may be able to answer complicatedqueries in an efficient manner, and may even improve the query by usingtechniques such as query optimization. As a result of these and otherprocesses, the evaluation of a query by a data store may consume a largeamount of computational resources.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key factors oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

While it may be advantageous to equip a data store with a sophisticatedquery processing engine that is capable of processing sophisticatedtransactions, some disadvantages may also arise. In particular, it maybe disadvantageous or inefficient to configure a data store to execute acomplex query on locally stored data. For example, a data store happensto store data that is in particularly high demand, but the queryprocessing engine may be taxed by the application of a complex queryapplied to the stored data while other queries (some of which may bevery simple) remain pending. A complex query may therefore create abottleneck that reduces the capacity and throughput of query evaluation.

As a second example, a distributed database architecture wherein a datastore also executes sophisticated queries may compromise some securityprinciples, since the machines that are storing the data are alsopermitted to execute potentially hazardous or malicious operations onthe data. Additionally, the query processing engines may even permit theexecution of arbitrary code on the stored data (e.g., an agent scenariowherein an executable module is received from a third party and executedagainst the stored data). A security principle that separates thestorage of the data (on a first set of machines) and the execution ofcomplex computation, including arbitrary code, on the data (allocated toa second set of machines) may present several security advantages, suchas a data item partition between stored data and a compromised machine.

These and other advantages may arise from removing complex processing ofdata from the data stores (e.g., the machines of a server farm that areconfigured to store the data of a distributed database). However, it mayalso be disadvantageous to configure the data stores with no processingcapabilities, e.g., as data stores functioning purely as data storagedevices, which is capable only of providing a requested data object(e.g., an entire table) or make specified alterations thereto. Forexample, another machine may request from the data store only a subsetof data, such as a subset of records from a table that satisfy aparticular filter criterion. However, if the request specifies only asmall number of records in a table containing many records, sending theentire table may be unduly inefficient, particularly given a bandwidthconstraint between the machine and the data store in a networkedenvironment.

Presented herein are techniques for configuring a data store to fulfilla request for data stored therein. In accordance with these techniques,the data store does not utilize a query processing engine that mightimpose significant computational costs, reduce performance in fulfillingrequests, and/or permit the execution of arbitrary code on the storeddata. However, the data store is also capable of providing only a subsetof data stored therein. The data store achieves this result by acceptingrequests specifying one or more filter criteria, each of which reducesthe requested amount of data in a particular manner. For example, therequest may include a filter criterion specifying a particular filtercriterion value, and may request only records having that filtercriterion value for a particular filter criterion (e.g., in a data storeconfigured to store data representing events, the filter criterion mayidentity a type of event or a time when the event occurred). The requesttherefore specifies only various filter criteria, and the data store iscapable of providing the data that satisfy the filter criteria, but isnot configured to process queries that may specify complex operations.This configuration may therefore promote the partitioning of adistributed database into a set of data nodes configured to store andprovide data, and a set of compute nodes capable of applying complexqueries (including arbitrary code).

To the accomplishment of the foregoing and related ends, the followingdescription and annexed drawings set forth certain illustrative aspectsand implementations. These are indicative of but a few of the variousways in which one or more aspects may be employed. Other aspects,advantages, and novel features of the disclosure will become apparentfrom the following detailed description when considered in conjunctionwith the annexed drawings.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is an illustration of an exemplary scenario featuring anapplication of a query to a data set distributed over several datastores.

FIG. 2 is an illustration of an exemplary scenario featuring anapplication of a request for data from a data set stored by a datastore.

FIG. 3 is an illustration of an exemplary scenario featuring anapplication of a request featuring at least one filter criterion fordata from a data set stored by a data store in accordance with thetechniques presented herein.

FIG. 4 is a flow chart illustrating an exemplary method of fulfillingrequests targeting a data set of a data set.

FIG. 5 is a flow chart illustrating an exemplary method of fulfillingrequests targeting a data set of a data set.

FIG. 6 is an illustration of an exemplary computer-readable mediumcomprising processor-executable instructions configured to embody one ormore of the provisions set forth herein.

FIG. 7 is an illustration of an exemplary scenario featuring an indexingof data items stored by a data set.

FIG. 8 is an illustration of an exemplary scenario featuring apartitioning of data items stored by a data set.

FIG. 9 is an illustration of an exemplary scenario featuring a data itemprocessor set comprising data item processors configured to filter dataitems in response to a request featuring at least one filter criterion.

FIG. 10 illustrates an exemplary computing environment wherein one ormore of the provisions set forth herein may be implemented.

DETAILED DESCRIPTION

The claimed subject matter is now described with reference to thedrawings, wherein like reference numerals are used to refer to likeelements throughout. In the following description, for purposes ofexplanation, numerous specific details are set forth in order to providea thorough understanding of the claimed subject matter. It may beevident, however, that the claimed subject matter may be practicedwithout these specific details. In other instances, structures anddevices are shown in block diagram form in order to facilitatedescribing the claimed subject matter.

Within the field of computing, many scenarios involve a data set, suchas a database stored by a data store. The data store may comprise acomputer equipped with a storage component (e.g., a memory circuit, ahard disk drive, a solid-state storage device, or a magnetic or opticalstorage disc) whereupon a set of data is stored, and may be configuredto execute software that satisfies requests to access the data that maybe received from various users and/or processes. In many such scenarios,the stored data may be voluminous, potentially scaling to millions orbillions of records stored in one table and/or a large number of tables,and/or complex, such as a large number of interrelationships amongrecords and tables and sophisticated constraints serving as constraintsupon the types of data that may be stored therein.

In some such scenarios, the data set may be stored on a plurality ofdata stores. As a first example, two or more data stores may storeidentical copies of the data set. This configuration may be advantageousfor promoting availability (e.g., one data store may respond to arequest for data when another data store is occupied or offline). As asecond example, the data set may be distributed over the data stores,such that each data store stores a portion of the data set. Thisconfiguration may be advantageous for promoting efficiency (e.g., adistribution of the computational burden of satisfying a request for aparticular set of data, such as a particular record, may be limited tothe data store that is storing the requested data). In many suchexamples, dozens or hundreds of data stores may be provided, such as ina server farm comprising a very large number of data stores thattogether store and provide access to a very large data set.

FIG. 1 presents an exemplary scenario 10 featuring a first architecturefor applying a query 14 submitted by a user 12 to a data set 20,comprising a set of data tables 22 each storing a set of records 26having particular attributes 24. In this exemplary scenario 10, the dataset 20 has been distributed across many data stores 18 in various ways.As a first example, the data set 20 may be vertically distributed; e.g.,the data set 20 may comprise several data tables 22 storing differenttypes of records 26, and a first data store 18 may store the records 26of a first data table 22 while a second data store 18 may store therecords 26 of a second data table 22. As a second example, the data set20 may be horizontally distributed; e.g., for a particular data table22, a first data store 18 may store a first set of records 26, while asecond data store 18 may store a second set of records 26. Thisdistribution may be arbitrary, or may be based on a particular attribute24 of the data table 22 (e.g., for an attribute 24 specifying analphabetic string, a first data store 18 may store records 26 beginningwith the letters ‘A’ through ‘L’, while a second data store 18 may storerecords 26 beginning with the letters ‘M’ to ‘Z’). Other ways ofdistributing the data tables 22 and data records 26 may also be devised;e.g., for a particular data table 22, a first data store 18 may store afirst set of attributes 24 for the records 26 and a second data store 18may store a second set of attributes 24 for the records 26, or two datastores 18 may redundantly store the same records 26 in order to promotethe availability of the records 26 and the rapid evaluation of queriesinvolving the records 26.

In many such scenarios, a user or process may submit a query to beapplied to the data set 20. For example, a Structured Query Language(SQL) query may comprise one or more operations to be applied to thedata set 20, such as selecting records 26 from one or more data tables22 having particular values for particular attributes 24, projectingparticular attributes 24 of such records 26, joining attributes 24 ofdifferent records 26 to create composite records 26, and applyingvarious other operations to the selected data (e.g., sorting, grouping,or counting the records) before presenting a query result. The query mayalso specify various alterations of the data set 20, such as insertingnew records 26, setting various attributes 24 of one or more records 26,deleting records 26, establishing or terminating relationships betweensemantically related records 26, and altering the layout of the data set20, such as by inserting, modifying, or deleting one or more data tables22. These operations may also be chained together into a set, sequence,or conditional hierarchy of such operations. Variants of the StructuredQuery Language also support more complex operations, such assophisticated data searching (e.g., support for identifying recordsmatching a regular expression), journaling (e.g., recording theapplication of operations that may later be reversed), and transactions(e.g., two or more operations where either are operations are performedsuccessfully or none are applied). Still other variants of theStructured Query Language may support the execution of code on the datastore; e.g., a query may specify or invoke a stored procedure that is tobe executed by the data store on the stored data, or may include anagent, such as an interpretable script or executable binary that isprovided to the data store for local execution. In order to evaluate andfulfill such queries, the data store 18 may comprise a query processingengine, such as a software pipeline comprising components that performvarious parsing operations on the query, such as associating names inthe query with the named objects of the database and identifying theoperations specified by various operators. By lexically parsing thelanguage of a query (e.g., identifying various components of the queryaccording to the syntax rules of the query language), identifying theoperations specified by each component of the query and the logicalstructure and sequence of the operations, and invoking a component thatis capable of fulfilling the operation, the data store 18 may achievethe evaluation and fulfillment of the query.

In these and other scenarios, the task of applying a complex query to adata set distributed across many data stores may present manyimplementation challenges. Many techniques and architectural frameworkshave been proposed to enable such application in an efficient andautomated manner.

The exemplary scenario 10 of FIG. 1 further presents one technique thatis often utilized to apply a query 14 to a data set 20 distributedacross many data stores 18. In this exemplary scenario 10, a user 12 maysubmit a query 14 comprising a set of operations 16 that may be appliedagainst the data set 20. Moreover, the operations 16 may be chainedtogether in a logical sequence, e.g., using Boolean operators to specifythat the results of particular operations 16 are to be utilizedtogether. The query 14 may be delivered to a MapReduce server 28,comprising a computer configured to apply a “MapReduce” technique todistribute the query 14 across the data stores 18 that are storingvarious portions of the data set 20. For example, the MapReduce server28 may identify that various operations 16 within the query 14 targetvarious portions of the data set 20 that are respectively stored byparticular data stores 18. For example, a first operation 16 may targetthe data stored by a first data store 18 (e.g., a Select operationapplied to a data table 22 and/or set of records 26 stored by the firstdata store 18), while a second operation 16 may target the data storedby a second data store 18. Accordingly, the MapReduce server maydecompose the query 14 into various query portions 30, each comprisingone or more operations to be performed by a particular data store 18.The data store 18 may receive the query portion 30, apply the operations16 specified therein, and generate a query result 32 that may bedelivered to the MapReduce server 28 (or to another data store 18 forfurther processing). The MapReduce server 28 may then compose the queryresults 32 provided by the data stores 18 to generate a query result 34that may be provided to the user 12 in response to the query 14. In thismanner, the data stores 18 and the MapReduce server 28 may interoperateto achieve the fulfillment of the query 14.

The exemplary scenario 10 of FIG. 1 may present some advantages (e.g.,an automated parceling out of the query 14 to multiple data stores 18,which may enable a concurrent evaluation of various query portions 30that may expedite the evaluation of the query 14). However, theexemplary scenario 10 may also present some disadvantages. Inparticular, it may be desirable to devise an architecture for adistributed data set, such as a distributed database, wherein thestorage and accessing of data is performed on a first set of devices,while complex computational processes are performed on a second set ofdevices. Such a partitioning may be advantageous, e.g., for improvingthe security of the data set 20. For example, queries 14 to be appliedto the data set 20 may be computationally expensive (e.g., involving alarge amount of memory), paradoxical (e.g., a recursive query that doesnot end or that cannot logically be evaluated), or malicious (e.g.,overly or covertly involving an unauthorized disclosure or modificationof the data set 20). In some scenarios, the computation may involve theexecution of code, such as a query 14 that invokes a stored procedurethat has been implemented on a data store 18, or mobile agent scenarios,wherein a third party may provide an “agent” (e.g., an interpretablescript or partially or wholly compiled executable) that may be appliedto the data set 20. Therefore, the security of the data set 20 may beimproved by restricting complex computation to a particular set ofcomputers that may be carefully monitored, and that may be suspended,taken offline, or replaced if such computers appear to be operating inways that may damage the data set 20. However, the exemplary scenario 10of FIG. 1 does not involve such partitioning. Rather, the data stores 18that store various portions of the data set 20 also execute queryportions 30 upon such data, and therefore fail to separate the accessingof the data set 20 from computation performed thereupon.

A second disadvantage that may arise in the exemplary scenario 10 ofFIG. 1 involves the performance of the data set 20. For example, aparticular data store 18 may be configured to store a query portion 30that, temporarily or chronically, is frequently accessed, such that thedata store 18 receives and handles many queries 14 involving the portionof the data set 20 in a short period of time. However, if the data store18 is also configured to perform complex computational processing of thestored data, a query 14 involving complex operations may consumecomputing resources of the data store 18 (e.g., memory, processorcapacity, and bandwidth) that may not be available to fulfill otherqueries 14. Therefore, a single complex query 14 may forestall theevaluation and fulfillment of other queries 14 involving the same datastored by the data store 18. By contrast, if complex computationinvolving this data were partitioned from the storage of such data, manycomputers may be configured to handle the queries 14 in parallel, and acomplex query 14 that ties up the resources of one computer may notaffect the evaluation or fulfillment of other queries 14 handled byother computers.

In view of these and other disadvantages that may arise from thearchitecture presented in the exemplary scenario 10 of FIG. 1, it may bedesirable to separate the storage and accessing of data in a data set 20from complex computational queries that may be applied to such data.However, a rigid partitioning, where the data store 18 only provideslow-level access and a compute node provides all computation, may alsobe inefficient.

FIG. 2 presents an exemplary scenario 40 wherein a data store 18 isconfigured to store a data set 20 comprising a large number of record 26(e.g., 50,000 records). A user 12 may submit a query 14, which may bereceived and wholly evaluated by a compute node 42. The compute node 42may comprise, e.g., a query processing engine, which may lexically parsethe query 14, identify the operations 16 specified therein, and invokevarious components to perform such operations 16, including retrievingdata from the data store 18. For example, instead of sending a query 14or a query portion 30 to the data store 18, the compute node 42 maysimply send a request 44 for a particular set of records 26, such as therecords 26 comprising a data table 22 of the data set 20. The data store18 may respond with a request result 48, comprising the requestedrecords 26, to which the compute node 42 may apply some complexcomputation (e.g., the operations 16 specified in the query 14) and mayreturn a query result 34 to the user 12. However, this exemplaryscenario 40 illustrates an inefficiency in this rigid partitioning ofresponsibilities between the compute node 42 and the data store 18. Forexample, the query 14 may request the retrieval of a single record 26(e.g., a record 26 of an employee associated with a particularidentifier), but the data table 22 stored by the data store 18 mayinclude many such records 26. Accordingly, the data store 18 may providea request result 48 comprising 50,000 records 26 to the compute node 42,even though only one such record 26 is included in the query result 34.Moreover, it may be easy to identify this record 26 from the scope ofthe query 14 (e.g., if the query 14 identifies the requested record 26according to an indexed field having unique identifiers for respectiverecords 26), but because the data store 18 cannot perform computationsinvolved in the evaluation of the query 14, this comparatively simplefiltering is not performed by the data store 18. This inefficiency maybecome particularly evident, e.g., if the request result 48 is sent tothe compute node 42 over a network 46, which may have limited capacity.The sending of many records 26 over the network 46 may impose arate-limiting factor on the completion of the query 14, thereby imposinga significant delay in the fulfillment of a comparatively simple query14 involving a small query result 34. These and other disadvantages mayarise from a hard partitioning of the responsibilities of data stores 18and compute nodes 42 comprising a data set 20.

Presented herein are techniques for configuring a data set 20 toevaluate queries 14. These techniques may be devised, e.g., in view ofthe advantages and disadvantages in the exemplary scenario 10 of FIG. 2and the exemplary scenario 40 of FIG. 2. In accordance with thesetechniques, a data store 18 may be configured to store one or more dataitems of a data set 20 (e.g., various tables 22, attributes 24, and/orrecords 26 of the data set 20), and to participate in the evaluation ofa query 14 against such data items. As compared with the exemplaryscenario 10 of FIG. 1, the data store 18 is not configured to evaluate aquery 14; e.g., the data store 18 may not include a query processingengine, and may refuse to accept or evaluate queries 14 formulated in aquery language, such as a Structured Query Language (SQL) query.Conversely, the data store 18 is not limited to providing one or moreportions of the data store 20 in response to a request 44, which maycause inefficiencies arising from a rigid partitioning, such asillustrated in the exemplary scenario 40 of FIG. 2. Rather, inaccordance with these techniques, the data store 18 is configured toaccept requests 44 including one or more filtering criteria that definea filtered data subset. For example, the data store 18 may store one ormore data tables 22 comprising various records 26, but a small number ofattributes 24 for the records 26 may be indexed. The filtering mayinvolve identifying, retrieving, and providing a data subset of the dataset 20, comprising the records 26 having a particular value for one ofthe indexed attributes 24. Because the application of the filteringcriterion to the data set 20 may result in a significant reduction ofdata to be sent in the filtered data subset 58 while consuming a smallfraction of the computational resources involved in the evaluation of aquery 14, the data store 18 may be configured to perform this filteringin response to the request 44. However, the data store 18 may beconfigured to refrain from performing more complex computationalprocesses; e.g., the data store 18 may wholly omit a query processingengine, may refuse to accept queries 14 specified in a query language,or may reject requests 44 specifying non-indexed attributes 26. In thismanner, the techniques presented herein may achieve greater efficiencyand security than in the exemplary scenario 10 of FIG. 1, while alsoavoiding the disadvantages presented in the exemplary scenario 40 ofFIG. 2.

FIG. 3 presents an illustration of an exemplary scenario 50 featuring anapplication of the techniques presented herein to apply a query 14submitted by a user 12 to a data set 20 storing various data items 52 inorder to generate and provide a query result 34. In this exemplaryscenario 50, access to the data set 20 may be achieved through a datastore 18, which, in turn, may be accessed through a compute node 42.However, if the user 12 or the compute node 42 were to submit the query14 to the data store 18, the data store 18 may refuse to accept thequery 14, or may be incapable of evaluating the query 14.(Alternatively, the data store 18 may accept and evaluate a query 14only in particular circumstances, e.g., where the query 14 is submittedby an administrator.) Instead, the user 12 (or an automated process) maysubmit the query 14 to the compute node 42, which may endeavor tointeract with the data store 18 to evaluate the query and provide aquery result 34. In particular, the compute node 42 may examine thequery 14 to identify a request 44 comprising one or more filter criteria54 that may specify a retrieval of particular data items 52 from thedata store 18. (e.g., identifying one or more operations 16 of the query14 that may be expressed as a request 44 for data items 52 satisfyingone or more filter criteria 54). The data store 18 is configured toreceive data items 52 and store received data items 52 in a storagecomponent (e.g., a memory circuit, a hard disk drive, a solid-statestorage device, or a magnetic or optical disc) as part of the data set20. Additionally, the data store 18 is configured to receive requests 44comprising one or more filter criteria 54. Upon receiving a request 44,the data store 18 may perform a filtering 56 to identify the data items52 that satisfy the filter criteria 54, and generate a filtered datasubset 58 to be returned to the compute node 42. The compute node 42 mayreceive the filtered data subset 58 and may apply the remainder of thequery 14 (e.g., performing complex computations specified by theoperations 16 of the query 14 that were not expressed in the request44). In some such scenarios, the compute node 42 may send a second orfurther request 44 to the data set 20 specifying other filter criteria54, and may utilize the second or further filtered data subsets 58 inthe computation. Eventually, the compute node 42 may generate a queryresult 34, which may be presented to the user 12 (or an automatedprocess) in response to the query 14. In this manner, the configurationof the data store 18, and optionally the compute node 42, may enable thefulfillment of queries 14 in a more efficient and secure manner thanpresented in the exemplary scenario 10 of FIG. 1 and/or the exemplaryscenario 40 of FIG. 2.

FIG. 4 presents a first embodiment of these techniques, illustrated asan exemplary method 60 of fulfilling requests 44 targeting a data set20. The exemplary method 60 may be performed, e.g., by a data store 18configured to store or having access to part or all of the data set 20.Additionally, the exemplary method 60 may be implemented, e.g., as a setof software instructions stored in a memory component (e.g., a systemmemory circuit, a platter of a hard disk drive, a solid state storagedevice, or a magnetic or optical disc) of the data store 18, that, whenexecuted by the processor of the data store 18, cause the processor toperform the techniques presented herein. The exemplary method 60 beginsat 62 and involves executing 64 the instructions on the processor. Morespecifically, the instructions are configured to, upon receiving a dataitem 52, store 66 the data item 52 in the data set 20. The instructionsare also configured to, upon receiving 68 a request 44 specifying atleast one filter criterion 54, retrieve 70 the data items 52 of the dataset 20 satisfying the at least one filter criterion to generate afiltered data subset 58, and to send 72 the filtered data subset 58 inresponse to the request 44. In this manner, the exemplary method 60achieves the fulfillment of the request 44 to access the data set 20without exposing the data store 18 to the security risks,inefficiencies, and consumption of computational resources involved inevaluating a query 14, and so ends at 74.

FIG. 5 presents a second embodiment of these techniques, illustrated asan exemplary method 80 of applying a query 14 to a data set 20 stored bya data store 18. The exemplary method 80 may be performed, e.g., on adevice, such as a compute node 42, having a processor. Additionally, theexemplary method 80 may be implemented, e.g., as a set of softwareinstructions stored in a memory component (e.g., a system memorycircuit, a platter of a hard disk drive, a solid state storage device,or a magnetic or optical disc) of the compute node 42 or other device,that, when executed by the processor, cause the processor to perform thetechniques presented herein. The exemplary method 80 begins at 82 andinvolves executing 84 the instructions on the processor. Morespecifically, the instructions are configured to, from the query 14,generate 86 a request 44 specifying at least one filter criterion 54.The instructions are also configured to send 88 the request 44 to thedata store 18, and, upon receiving from the data store 18 a filtereddata subset 58 in response to the request 44, apply 90 the query 14 tothe filtered data subset 56. In this manner, the exemplary method 80achieves the fulfillment of a query 14 to the data set 20 withoutexposing the data store 18 to the security risks, inefficiencies, andconsumption of computational resources involved in evaluating the query14, and so ends at 92.

Still another embodiment involves a computer-readable medium comprisingprocessor-executable instructions configured to apply the techniquespresented herein. Such computer-readable media may include, e.g.,computer-readable storage media involving a tangible device, such as amemory semiconductor (e.g., a semiconductor utilizing static randomaccess memory (SRAM), dynamic random access memory (DRAM), and/orsynchronous dynamic random access memory (SDRAM) technologies), aplatter of a hard disk drive, a flash memory device, or a magnetic oroptical disc (such as a CD-R, DVD-R, or floppy disc), encoding a set ofcomputer-readable instructions that, when executed by a processor of adevice, cause the device to implement the techniques presented herein.Such computer-readable media may also include (as a class oftechnologies that are distinct from computer-readable storage media)various types of communications media, such as a signal that may bepropagated through various physical phenomena (e.g., an electromagneticsignal, a sound wave signal, or an optical signal) and in various wiredscenarios (e.g., via an Ethernet or fiber optic cable) and/or wirelessscenarios (e.g., a wireless local area network (WLAN) such as WiFi, apersonal area network (PAN) such as Bluetooth, or a cellular or radionetwork), and which encodes a set of computer-readable instructionsthat, when executed by a processor of a device, cause the device toimplement the techniques presented herein.

An exemplary computer-readable medium that may be devised in these waysis illustrated in FIG. 6, wherein the implementation 100 comprises acomputer-readable medium 102 (e.g., a CD-R, DVD-R, or a platter of ahard disk drive), on which is encoded computer-readable data 104. Thiscomputer-readable data 104 in turn comprises a set of computerinstructions 106 configured to operate according to the principles setforth herein. In one such embodiment, the processor-executableinstructions 106 may be configured to perform a method of fulfillingrequests targeting a data set of a data set, such as the exemplarymethod 60 of FIG. 4. In another such embodiment, theprocessor-executable instructions 106 may be configured to implement amethod of applying a query to a data set stored by a data store, such asthe exemplary method 80 of FIG. 5. Some embodiments of thiscomputer-readable medium may comprise a nontransitory computer-readablestorage medium (e.g., a hard disk drive, an optical disc, or a flashmemory device) that is configured to store processor-executableinstructions configured in this manner. Many such computer-readablemedia may be devised by those of ordinary skill in the art that areconfigured to operate in accordance with the techniques presentedherein.

The techniques discussed herein may be devised with variations in manyaspects, and some variations may present additional advantages and/orreduce disadvantages with respect to other variations of these and othertechniques. Moreover, some variations may be implemented in combination,and some combinations may feature additional advantages and/or reduceddisadvantages through synergistic cooperation. The variations may beincorporated in various embodiments (e.g., the exemplary method 60 ofFIG. 4 and the exemplary method 80 of FIG. 5) to confer individualand/or synergistic advantages upon such embodiments.

A first aspect that may vary among embodiments of these techniquesrelates to the scenarios wherein such techniques may be utilized. As afirst variation, many types of data stores 18 (and/or compute nodes 42)may be utilized to apply the queries 14 and requests 44 to a data set20. As one such example, the data stores 18 and/or compute nodes 42 maycomprise distinct hardware devices (e.g., different machines orcomputers), distinct circuits (e.g., field-programmable gate arrays(FPGAs)) operating within a particular hardware device, or softwareprocesses (e.g., separate threads) executing within one or morecomputing environments on one or more processors of a particularhardware device. The data stores 18 and/or compute nodes 42 may alsocomprise virtual processes, such as distributed processes that may beincrementally executed on various devices of a device set. Additionally,respective data stores 18 may internally store the data items 52comprising the data set 20, or may have access to other data stores 18that internally store the data items 52 (e.g., a data access layer ordevice interfacing with a data storage layer or device). As a secondvariation, many types of data sets 20 may be accessed using thetechniques presented herein, such as a database, a file system, a medialibrary, an email mailbox, an object set in an object system, or acombination of such data sets 20. Similarly, many types of data items 52may be stored in the data set 20. As a third variation, the queries 14and/or requests 44 evaluated using the techniques presented herein maybe specified in many ways. For example, a query 14 may be specifiedaccording to a Structured Query Language (SQL) variant, as alanguage-integrated query (e.g., a LINQ query), or an interpretablescript or executable object configured to perform various manipulationsof the data items 52 within the data set 20. The request 44 may also bespecified in various ways, e.g., simply specifying an indexed attribute24 and one or more values of such attributes 24 of data items 52 to beincluded in the filtered data subset 58. While the request 44 is limitedto one or more filter criteria 54 specifying the data items 52 to beincluded in the filtered data subset 58, the language, syntax, and/orprotocol whereby the query 14 and request 44 are formatted may notsignificantly affect the application or implementation of the techniquespresented herein.

A second aspect that may vary among embodiments of these techniquesrelates to the storing of data items 52 in the data set 20 by the datastore 18. As a first variation, a data store 18 may comprise at leastone index, which may correspond to one or more filter criteria 54 (e.g.,a particular attribute 24, such that records 26 containing one or morevalues for the attribute 24 are to be included in the filtered datasubset 58). A data store 18 may be configured to, upon receiving a dataitem 52, index the data item in the index according to the filtercriterion 54 (e.g., according to the value of the data item 52 for oneor more attributes 24 that may be targeted by a filter criterion 54).The data store 18 may then be capable of fulfilling a request 44 byidentifying the data items 52 satisfying the filter criteria 54 of therequest 44 by using an index corresponding to the filter criterion 54.It may be advantageous to choose attributes 24 of the data items 52 forindexing that are likely to be targeted by filter criteria 54 ofrequests 44, and to refrain from indexing the other attributes 24 of thedata items 52 (e.g., indices have to be maintained as data items 52change, and it may be disadvantageous to undertake the computationalburden of such maintenance in order to index an attribute 24 that is notlikely to be frequently included as a filter criterion 54). For example,in a database configured to track events performed by various users atvarious times, it may be desirable to configure a data store 18 togenerate and maintain indices for an index set comprising an event indexspecifying an represented by respective data items 52; a time indexspecifying a time of an event represented by respective data items 52;and a user index specifying at least one user associated with an eventrepresented by respective data items 52. However, it may not bedesirable to generate and maintain indices for other attributes 24 ofthis data set 20, such as a uniform resource identifier (URI) of adigital resource involved in the request, a comment field whereupontextual comments regarding particular events may be entered by varioususers and administrators, or a “blob” field involving a large data setinvolved in the event (e.g., a system log or a captured image thatdepicts the event).

As a further variation of this second aspect, the index may identifydata items 52 associated with one or more particular filter criterionvalues for a particular filter criterion 54 in various ways. As one suchexample, an index may specify, for a filter criterion value of a filtercriterion 54 corresponding to the index, a data item set that identifiesthe data items having the filter criterion value for the filtercriterion 54. For example, the index may store, for each filtercriterion value of the filter criterion 54, a set of references to thedata items 52 associated with the filter criterion value. Additionally,the data item set stored in the index may be accessible in various ways.For example, the index may permit incremental writing to the data itemset (e.g., indexing a new data item 52 by adding the data item 52 to thedata item set of data items having the filter criterion value for thefilter criterion), but may permit only atomic reading of the data itemset (e.g., for a request 44 specifying a particular filter criterionvalue for a particular filter criterion 54, the index may read andpresent the entire data item set, comprising the entire set ofreferences to such data items 52). As a further variation, the datastore 18 may, upon receipt of respective data items 52, store the dataitems 52 in a data item buffer, such that, when the data item bufferexceeds a data item buffer size threshold (e.g., the capacity of thedata item buffer), the data store 18 may add the data items torespective data item sets and empty the data item buffer.

FIG. 7 presents an illustration of an exemplary scenario 110 featuringan indexing of data items 52 in one or more data item sets 118 indexedaccording to an index 112. In this exemplary scenario 110, the datastore 18 may receive various data items 52 (e.g., a set of reportedevents) and may store such data items 52 in a data set 20. Inparticular, the data store 18 may generate an index 112, comprising aset of index entries 114 including references 116 to one or more dataitems 52 of one or more data item sets 118, each corresponding to adifferent filter criterion value for a filter criterion 54 (e.g., themonth and year of a date when an event occurred). Upon receiving a dataitem 52, the data store 18 may identify one or more filter criterionvalues of the data item 52, and may store a reference to the data item52 stored in an index entry 114 of the index 112 corresponding to thefilter criterion value. The data store 18 may then store the data item52 in the data set 20 (e.g., by appending the data item 52 to a list ofrecords 26). When a user 12 submits a request 44 to a data store 18(either directly or indirectly, e.g., by submitting a query 14 to acompute node 42 that is configured to generate from the query 14 arequest 44 specifying one or more filter criteria 54), the data store 18may fulfill the request 44 by retrieving a data item set 118 associatedwith the filter criterion value, and in particular may do so byidentifying the index entry 114 of the index 112 identifying the dataitems 52 of the data item set 118 corresponding to the filter criterionvalue. The data store 18 may then use the references 116 stored in theindex entry 114 to retrieve the data items 52 of the data item set 118,and may send such data items 52 as the filtered data subset 58. In thismanner, even if the data items 52 are stored together in an arbitrarymanner, the data store 18 may fulfill the request 44 in an efficientmanner by using the index 112 corresponding to the filter criterion 54of the request 44. For example, respective index entries 114 of an index112 may store, for a first filter criterion value of a filter criterion54, references to data item partitions corresponding to respectivesecond filter criterion values of a second filter criterion 54. Dataitems 52 may be stored and/or retrieved using this two-tier indexingtechnique. For example, storing a data item 52 may involve using theindex 112 to identify the index entry 114 associated with a first filtercriterion value of a first filter criterion 54 for the data item 52,examining the data item partitions referenced by the index entry 114 toidentify the data item partition associated with a second filtercriterion value of a second filter criterion 54 for the data item 52,and storing the data item 52 in the data item partition. Conversely,retrieving data items 52 having a particular first filter criterionvalue of a first filter criterion 54 and a particular second filtercriterion value of a second filter criterion 54 may involve using theindex 112 to identify the index entry 114 associated with the firstfilter criterion value; examining the data item partitions referenced inthe index entry 114 to identify the data item partition associated withthe second filter criterion value; and retrieving and sending the dataitem partition in response to the request 44.

As a further variation of this second aspect, a data store 18 mayconfigure an index as a set of partitions, each including the data items52 (or references thereto, e.g., a memory reference or URI where thedata item 52 may be accessed, or a distinctive identifier of the dataitem 52, such as a key value of a key field of a data table 22)satisfying a particular filter criterion 54. For example, the data store18 may generate various partitions, such as small sections of memoryallocated to store data items 52 having a particular filter criterionvalue of a particular filter criterion 54. Upon receiving a data item52, the data store 18 may store the data item 52 in the correspondingpartition; and upon receiving a request 44 specifying a filter criterionvalue of a particular filter criterion 54, the data store 18 may thedata item partition storing the data items 52 having the filtercriterion value for the filter criterion, and send the data itempartition as the filtered data subset 58. As a still further variation,two or more indices may be utilized to group data items according to twoor more filter criteria 54.

FIG. 8 presents an illustration of an exemplary scenario 120 featuring apartitioning of data items 52 into respective data item partitions 122.In this exemplary scenario 120, the data store 18 may receive variousdata items 52 (e.g., a set of reported events) and may store such dataitems 52 in a data set 20. The data store 18 may again generate an index112 (not shown), comprising a set of index entries 114 includingreferences 116 to one or more data items 52 of one or more data itemsets 118, each corresponding to a different filter criterion value for afilter criterion 54 (e.g., the month and year of a date when an eventoccurred). However, in contrast with the exemplary scenario 110 of FIG.7, in this exemplary scenario 120 the data items 52 are stored in amanner that is partitioned according to the filter criterion value. Uponreceiving a data item 52, the data store 18 may identify one or morefilter criterion values of the data item 52, and may identify a dataitem partition 122 associated with the filter criterion value. The datastore 18 may then store the data item 52 in the data item partition 122corresponding to the filter criterion value. When a user 12 submits arequest 44 to a data store 18 (either directly or indirectly, e.g., bysubmitting a query 14 to a compute node 42 that is configured togenerate from the query 14 a request 44 specifying one or more filtercriteria 54), the data store 18 may fulfill the request 44 by retrievinga data item set 118 associated with the filter criterion value, and inparticular may do so by identifying the data item partition 122associated with the filter criterion value. The data store 18 may thenretrieve the entire data item partition 122, and may send the entiredata item partition 122 to the user 12. Additional data item partitions122 may be retrieved and send in response to other filter criteria 54(e.g., two or more filter criterion values for a particular filtercriterion 54, or a filter criterion value specified in the alternativefor each of two or more different filter criteria 54). In this manner,the data store 18 may identify and provide the data items 52 satisfyingthe filter criterion 54 in an efficient manner by using the data itemindices 122 corresponding to one or more filter criteria 54 specified inthe request 44. Those of ordinary skill in the art may devise many waysof storing data items 52 of a data set 20 in accordance with thetechniques presented herein.

A third aspect that may vary among embodiments of these techniquesinvolves the configuration of a data store 18 and/or a compute node 42to retrieve data items 52 satisfying the filter criteria 54 of a request44. As a first variation, the request 44 may comprise many types offilter criteria 54. In particular, the request 44 may specify a firstfiltered data subset 58 that may relate to the data items 52 comprisinga second filtered data subset 58, and the data store 18 may utilize thefirst filtered data subset 58 while generating the second filtered datasubset 58. For example, a query 14 may involve a request 44 specifyinganother filtered data subset 58 (e.g., in the query 14 “select usernamefrom users where user.id in (10, 22, 53, 67)”, a request 44 is filteredaccording to a set of numeric user IDs presented as a filtered datasubset 58). As a further variation, a query 14 may involve a firstrequest 44 specifying a first filtered data subset 58, which may bereferenced in a second request 44 specifying a second filtered datasubset 58. For example, in the query 14 “select username from userswhere user.id in (select users from events where event.type=12”), afirst filtered data subset 58 is generated from the events data table(using a first request 44, e.g., “SET_1=event.type=12”), and the firstfiltered data subset 58 is referenced by a second request 44 (e.g.,“user.id in SET_1”), resulting in a second filtered data subset 58. Inthis manner, a request 44 may reference a filtered data subset 58generated by another request 44, including an earlier request 44provided and processed while evaluate the same query 14.

As a second variation of this third aspect, when presented with arequest 44 including at least one filter criterion 54, a data store 18may be configured to retrieve from the data set 20 the content items 52satisfying respective filter criteria 54 of the request 44 (e.g., byutilizing an index 112 to identify a data set 118 and/or data itempartition 122, as in the exemplary scenario 110 of FIG. 7 and theexemplary scenario 120 of FIG. 8). Alternatively, rather than utilizingan index, the data store 18 may retrieve all of the data items 52 of thedata set 20, and may send (e.g., to a compute node 42 or user 12submitting the request 44 to the data store 18) only the data items 52satisfying the at least one filter criterion. In the former example, thefilter of data items 52 is achieved during the indexing of the dataitems 52 upon receipt; but in the latter example, the filtering of dataitems 52 is achieved during the sending of the data items 52. It may bedifficult to filter all of the data items 52 in realtime, e.g., in orderto fulfill a request 44. However, some techniques may be utilized toexpedite the realtime filtering of the data items 52, alternatively orin combination with the use of indices 112 and/or partitions 122.

FIG. 9 presents an illustration of an exemplary scenario 130 featuringone technique for implementing a realtime filtering of data items 52. Inthis exemplary scenario 130, a data store 18 receives from a user 12 arequest 44 specifying at least one filter criterion 54, and endeavors tofulfill the request 44 by providing a filtered data subset 58 comprisingonly the data items 52 satisfying the filter criteria 54 of the request44. However, in this exemplary scenario 130, the data store 18 retrievesall of the data items 52 from the data set 20, and then applies a dataitem processor set 132 to the entire set of data items 52 in order toidentify and provide only the data items 52 satisfying the filtercriteria 54. The data item processor set 132 may comprise, e.g., a setof data item processors 134, each having a state 136 and at least onefiltering condition (e.g., a logical evaluation of any particular dataitem 52 to identify whether or not a filtering criterion 54 issatisfied). The data item processors 134 may be individually configuredto, upon receiving a data item 52, update the state 136 of the data itemprocessor 134; and when the state 136 of the data item processor 134satisfies the at least one filtering condition, the data item processor134 may authorize the data item 52 to be sent (e.g., by including thedata item 52 in the filtered data subset 58, or by sending the data item52 to a different data item processor 134 for further evaluation). Thedata item processors 134 may therefore be interconnected and mayinteroperate, e.g., as a realtime processing system that evaluates dataitems 52 using a state machine. Accordingly, the data store 18 mayinvoke the data item processor set 132 upon the data items 52 retrievedfrom the data set 20, and may send only the data items 52 that have beenauthorized to be sent by the data item processor set 132. In thismanner, the data store 18 may achieve an ad hoc, realtime evaluation ofall data items 52 of the data set 20 to identify and deliver the dataitems 52 satisfying the filter criteria 54 of the request 44 withouthaving to generate, maintain, or utilize indices 112 or partitions 122.

As a third variation of this third aspect, the data store 18 may, beforeproviding a filtered data subset 58 in response to a request 44 (andoptionally before retrieving the data items 18 matching the filtercriteria 54 of the request 44), estimate the size of the filtered datasubset 58. For example, a request 44 received by the data store 18 mayinvolve a comparatively large filtered data subset 58 that may take asignificant amount of computing resources to retrieve and send inresponse to the request 44. Therefore, for requests 44 received from arequester (e.g., a particular user 12 or automated process), anembodiment may first estimate a filtered data subset size of thefiltered data subset 58 (e.g., a total estimated number of records 26 ordata items 52 to be included in the filtered data subset 58), and mayendeavor to verify that the retrieval of the filtered data subset 58 ofthis size is acceptable to the requester. Accordingly, an embodiment maybe configured to, before sending a filtered data subset 58 in responseto a request 44, estimate the filtered data subset size of the filtereddata subset 58 and send the filtered subset data size to the requester,and may only proceed with the retrieval and sending of the filtered datasubset 58 upon receiving a filtered data subset authorization from therequester. Conversely, a compute node 42 may be configured to, aftersending a request 44 specifying at least one filter criterion 54 andbefore receiving a filtered data subset 58 in response to the request44, receive from the data store 18 an estimate of a filtered data subsetsize of the filtered data subset 58, and may verify the filtered datasubset size (e.g., by presenting the filtered data subset size to a user12, or by comparing the filtered data subset size with an acceptablefiltered data subset size threshold, defining an acceptable utilizationof computing resources of the data store 18 and/or network 46). If theestimated filtered data subset size is acceptable, the compute node 42may generate and send to the data store 18 a filtered data subsetauthorization, and may subsequently receive the filtered data subset 58.Those of ordinary skill in the art may devise many ways of configuring adata store 18 and/or a compute node 42 to retrieve data items 52 fromthe data set 20 in accordance with the techniques presented herein.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims.

As used in this application, the terms “component,” “module,” “system”,“interface”, and the like are generally intended to refer to acomputer-related entity, either hardware, a combination of hardware andsoftware, software, or software in execution. For example, a componentmay be, but is not limited to being, a process running on a processor, aprocessor, an object, an executable, a thread of execution, a program,and/or a computer. By way of illustration, both an application runningon a controller and the controller can be a component. One or morecomponents may reside within a process and/or thread of execution and acomponent may be localized on one computer and/or distributed betweentwo or more computers.

Furthermore, the claimed subject matter may be implemented as a method,apparatus, or article of manufacture using standard programming and/orengineering techniques to produce software, firmware, hardware, or anycombination thereof to control a computer to implement the disclosedsubject matter. The term “article of manufacture” as used herein isintended to encompass a computer program accessible from anycomputer-readable device, carrier, or media. Of course, those skilled inthe art will recognize many modifications may be made to thisconfiguration without departing from the scope or spirit of the claimedsubject matter.

FIG. 10 and the following discussion provide a brief, generaldescription of a suitable computing environment to implement embodimentsof one or more of the provisions set forth herein. The operatingenvironment of FIG. 10 is only one example of a suitable operatingenvironment and is not intended to suggest any limitation as to thescope of use or functionality of the operating environment. Examplecomputing devices include, but are not limited to, personal computers,server computers, hand-held or laptop devices, mobile devices (such asmobile phones, Personal Digital Assistants (PDAs), media players, andthe like), multiprocessor systems, consumer electronics, mini computers,mainframe computers, distributed computing environments that include anyof the above systems or devices, and the like.

Although not required, embodiments are described in the general contextof “computer readable instructions” being executed by one or morecomputing devices. Computer readable instructions may be distributed viacomputer readable media (discussed below). Computer readableinstructions may be implemented as program modules, such as functions,objects, Application Programming Interfaces (APIs), data structures, andthe like, that perform particular tasks or implement particular abstractdata types. Typically, the functionality of the computer readableinstructions may be combined or distributed as desired in variousenvironments.

FIG. 10 illustrates an example of a system 140 comprising a computingdevice 142 configured to implement one or more embodiments providedherein. In one configuration, computing device 142 includes at least oneprocessing unit 146 and memory 148. Depending on the exact configurationand type of computing device, memory 148 may be volatile (such as RAM,for example), non-volatile (such as ROM, flash memory, etc., forexample) or some combination of the two. This configuration isillustrated in FIG. 10 by dashed line 144.

In other embodiments, device 142 may include additional features and/orfunctionality. For example, device 142 may also include additionalstorage (e.g., removable and/or non-removable) including, but notlimited to, magnetic storage, optical storage, and the like. Suchadditional storage is illustrated in FIG. 10 by storage 150. In oneembodiment, computer readable instructions to implement one or moreembodiments provided herein may be in storage 150. Storage 150 may alsostore other computer readable instructions to implement an operatingsystem, an application program, and the like. Computer readableinstructions may be loaded in memory 148 for execution by processingunit 146, for example.

The term “computer readable media” as used herein includes computerstorage media. Computer storage media includes volatile and nonvolatile,removable and non-removable media implemented in any method ortechnology for storage of information such as computer readableinstructions or other data. Memory 148 and storage 150 are examples ofcomputer storage media. Computer storage media includes, but is notlimited to, RAM, ROM, EEPROM, flash memory or other memory technology,CD-ROM, Digital Versatile Disks (DVDs) or other optical storage,magnetic cassettes, magnetic tape, magnetic disk storage or othermagnetic storage devices, or any other medium which can be used to storethe desired information and which can be accessed by device 142. Anysuch computer storage media may be part of device 142.

Device 142 may also include communication connection(s) 156 that allowsdevice 142 to communicate with other devices. Communicationconnection(s) 156 may include, but is not limited to, a modem, a NetworkInterface Card (NIC), an integrated network interface, a radio frequencytransmitter/receiver, an infrared port, a USB connection, or otherinterfaces for connecting computing device 142 to other computingdevices. Communication connection(s) 156 may include a wired connectionor a wireless connection. Communication connection(s) 156 may transmitand/or receive communication media.

The term “computer readable media” may include communication media.Communication media typically embodies computer readable instructions orother data in a “modulated data signal” such as a carrier wave or othertransport mechanism and includes any information delivery media. Theterm “modulated data signal” may include a signal that has one or moreof its characteristics set or changed in such a manner as to encodeinformation in the signal.

Device 142 may include input device(s) 154 such as keyboard, mouse, pen,voice input device, touch input device, infrared cameras, video inputdevices, and/or any other input device. Output device(s) 152 such as oneor more displays, speakers, printers, and/or any other output device mayalso be included in device 142. Input device(s) 154 and output device(s)152 may be connected to device 142 via a wired connection, wirelessconnection, or any combination thereof. In one embodiment, an inputdevice or an output device from another computing device may be used asinput device(s) 154 or output device(s) 152 for computing device 142.

Components of computing device 142 may be connected by variousinterconnects, such as a bus. Such interconnects may include aPeripheral Component Interconnect (PCI), such as PCI Express, aUniversal Serial Bus (USB), firewire (IEEE 1394), an optical busstructure, and the like. In another embodiment, components of computingdevice 142 may be interconnected by a network. For example, memory 148may be comprised of multiple physical memory units located in differentphysical locations interconnected by a network.

Those skilled in the art will realize that storage devices utilized tostore computer readable instructions may be distributed across anetwork. For example, a computing device 160 accessible via network 158may store computer readable instructions to implement one or moreembodiments provided herein. Computing device 142 may access computingdevice 160 and download a part or all of the computer readableinstructions for execution. Alternatively, computing device 142 maydownload pieces of the computer readable instructions, as needed, orsome instructions may be executed at computing device 142 and some atcomputing device 160.

Various operations of embodiments are provided herein. In oneembodiment, one or more of the operations described may constitutecomputer readable instructions stored on one or more computer readablemedia, which if executed by a computing device, will cause the computingdevice to perform the operations described. The order in which some orall of the operations are described should not be construed as to implythat these operations are necessarily order dependent. Alternativeordering will be appreciated by one skilled in the art having thebenefit of this description. Further, it will be understood that not alloperations are necessarily present in each embodiment provided herein.

Moreover, the word “exemplary” is used herein to mean serving as anexample, instance, or illustration. Any aspect or design describedherein as “exemplary” is not necessarily to be construed as advantageousover other aspects or designs. Rather, use of the word exemplary isintended to present concepts in a concrete fashion. As used in thisapplication, the term “or” is intended to mean an inclusive “or” ratherthan an exclusive “or”. That is, unless specified otherwise, or clearfrom context, “X employs A or B” is intended to mean any of the naturalinclusive permutations. That is, if X employs A; X employs B; or Xemploys both A and B, then “X employs A or B” is satisfied under any ofthe foregoing instances. In addition, the articles “a” and “an” as usedin this application and the appended claims may generally be construedto mean “one or more” unless specified otherwise or clear from contextto be directed to a singular form.

Also, although the disclosure has been shown and described with respectto one or more implementations, equivalent alterations and modificationswill occur to others skilled in the art based upon a reading andunderstanding of this specification and the annexed drawings. Thedisclosure includes all such modifications and alterations and islimited only by the scope of the following claims. In particular regardto the various functions performed by the above described components(e.g., elements, resources, etc.), the terms used to describe suchcomponents are intended to correspond, unless otherwise indicated, toany component which performs the specified function of the describedcomponent (e.g., that is functionally equivalent), even though notstructurally equivalent to the disclosed structure which performs thefunction in the herein illustrated exemplary implementations of thedisclosure. In addition, while a particular feature of the disclosuremay have been disclosed with respect to only one of severalimplementations, such feature may be combined with one or more otherfeatures of the other implementations as may be desired and advantageousfor any given or particular application. Furthermore, to the extent thatthe terms “includes”, “having”, “has”, “with”, or variants thereof areused in either the detailed description or the claims, such terms areintended to be inclusive in a manner similar to the term “comprising.”

What is claimed is:
 1. Enacted on a compute node of a distributeddatabase, the compute node comprising a processor, a method offulfilling queries targeting a data set comprising a set of records thatare stored in a data store of the distributed database separate from thecompute node, the method comprising: executing, on the processor,instructions that cause the compute node to: receive a query targetingthe data set, wherein the query specifies a computation to be appliedonly to a portion of the data set that matches at least one filtercriterion; partition the query into a filter portion that filters thedata set into a filtered data subset according to the at least onefilter criterion, and a computation portion specifying at least onecomputation to be performed only on the filtered data subset of the datastore; according to the filter portion of the query, generate afiltering request to retrieve a filtered data subset comprising a firstportion of the data set satisfying the at least one filter criterion andexcluding a second portion of the data set not satisfying the at leastone filter criterion, according to the at least one filter criteriondistinguishing the first portion from the second portion of the dataset; send the generated filtering request to the data store to cause thedata store to return all records satisfying the filter criterion andexclude all records not satisfying the filter criterion; and responsiveto receiving the filtered data subset from the data store responsive tothe filtering request, apply the computation portion of the query to thefiltered data subset.
 2. The method of claim 1, wherein: the data storefurther comprises an index that specifies, for a filter criterion valueof the filter criterion, a data item set identifying the data items ofthe data set having the filter criterion value for the filter criterion,wherein the index is selected from an index set further comprising: anevent index specifying an event represented by respective data items; atime index specifying a time of an event represented by respective dataitems; and a user index specifying at least one user associated with anevent represented by respective data items; and the filtering requestfurther specifies the filter criterion that is indexed by the indexstored by the data store.
 3. The method of claim 1, wherein: the datastore further comprises a data item buffer that stores received dataitems; and storing the data item further comprises: for the data itemsstored in the data item buffer exceeding a data item buffer sizethreshold: adding respective data items of the data item buffer in thedata item set, and emptying the data item buffer.
 4. The method of claim1, wherein: the data store further comprises at least one data itempartition that stores data items having a filter criterion value for afilter criterion; and invoking the data store with the request furthercomprises: for at least one filter criterion value for respective filtercriteria, invoking the data store with the data item partition storingdata items having the filter criterion value for the filter criterion.5. The method of claim 4, wherein executing the instructions furthercauses the computer to, responsive to receiving a data item: identify atleast one filter criterion value for at least one filter criterion thatis indexed by an index used by the data store; identify a data itempartition storing data items having the filter criterion value for thefilter criterion; and store the data item in the data item partition. 6.The method of claim 5, wherein storing a data item further comprises:identifying a first filter criterion value of the data item for thefirst filter criterion; using the index, identifying the data itempartitions storing data items having the first filter criterion valuefor the first filter criterion; identifying a second filter criterionvalue of the data item for the second filter criterion; among the dataitem partitions, identifying the data item partition storing data itemshaving the second filter criterion value for the second filtercriterion; and storing the data item in the data item partition.
 7. Themethod of claim 5, wherein invoking the data store with the requestfurther comprises: invoking the data store with the request and a firstfilter criterion value for a first filter criterion and a second filtercriterion value for a second filter criterion comprising: using theindex, identifying the data item partitions storing data items havingthe first filter criterion value for the first filter criterion; amongthe data item partitions, identifying the data item partition storingdata items having the second filter criterion value for the secondfilter criterion; and invoking the data item partition with the requestto retrieve the at least one data item.
 8. The method of claim 1,wherein: the request further specifies a first filtered data subset thatis to be used to generate the filtered data subset; and invoking thedata store with the request further comprises: invoking the data storewith the request and the first filtered data subset to retrieve the dataitems of the data set satisfying the at least one filter criterion andusing the first filtered data subset to generate a filtered data subset.9. The method of claim 8, wherein the first filtered data subset isgenerated by the data store in response to a preceding requestspecifying at least one first filter criterion.
 10. The method of claim1, wherein executing the instructions further causes the computer to,before invoking the request to retrieve the filtered data subset:estimate a filtered data subset size of the filtered data subset; sendthe filtered subset data size to the requester; and responsive toreceiving from the requester a filtered data subset authorization, sendthe filtered data subset in response to the request.
 11. The method ofclaim 1, wherein: the data items of the data set are distributed betweena first data store and a second data store; and invoking the data storewith the filtering request further comprises: between the first datastore and the second data store, identify a selected data store that islikely to contain the data items specified by the query; and invoke theselected data store with the filtering request.
 12. The method of claim1, wherein: executing the instructions further causes the device to:among attributes of the data set, identify a selected attribute that islikely to be targeted by filter criteria; and responsive to receivingthe data item to be stored with the data set, indexing the data item inan index according to the selected attribute.
 13. The method of claim12, wherein: the index further comprises: a first index that indexes thedata items according to a first attribute, and a second index thatindexes the data items according to a second attribute; and indexing thedata item further comprises: indexing the data item in the first indexaccording to the first attribute; and indexing the data item in thesecond index according to the second attribute.
 14. A device configuredas a compute node for a distributed database, the device beingconfigured to apply queries to a data set comprising a set of recordsthat are stored by a remote data store of the distributed databaseaccessible through a remote server, the device comprising: a processor,and a memory storing instructions that, when executed by the processor,cause the device to: receive a query targeting the data set, wherein thequery specifies a computation to be applied only to a portion of thedata set that matches at least one filter criterion; partition the queryinto a filter portion that filters the data set into a filtered datasubset that satisfies the at least one filter criterion, and acomputation portion specifying at least one computation to be performedonly on the filtered data subset; according to the filter portion of thequery, generate a filtering request to retrieve a filtered data subsetcomprising a first portion of the data set satisfying the at least onefilter criterion and excluding a second portion of the data set notsatisfying the at least one filter criterion, according to the at leastone filter criterion distinguishing the first portion from the secondportion of the data set; send the generated filtering request to theremote data store to return all records satisfying the filter criterionand excluding all records not satisfying the filter criterion; andresponsive to receiving the filtered data subset from the remote datastore responsive to the filtering request, apply the computation portionof the query to the filtered data subset.
 15. The device of claim 14,wherein: the query further comprises: a first filter criteriongenerating a first filtered data subset, and a second filter criteriongenerating a second filtered data subset from the first filtered datasubset; generating the request further comprises: generating a firstrequest specifying the first data subset filtered according to the firstfilter criterion; sending the request to the remote data store furthercomprises: sending the first request to the remote data store; andapplying the query further comprises: responsive to receiving from theremote data store the first filtered data subset in response to thefirst request: generating a second request specifying the second datasubset filtered according to the second filter criterion and using thefirst filtered data subset; and sending the second request to the remotedata store; and responsive to receiving from the remote data store thesecond filtered data subset in response to the second request, apply thecomputation portion of the query to the second filtered data subset. 16.The device of claim 14, wherein executing the instructions furthercauses the device to, before receiving the filtered data subset from theremote data store: receive from the remote data store a filtered datasubset size of the filtered data subset; verify the filtered data subsetsize to generate a filtered data subset authorization; and responsive togenerating a filtered data subset authorization, send the filtered datasubset authorization to the remote data store.
 17. A memory devicestoring instructions that, when executed on a processor of a computerconfigured as a compute node for a distributed database comprising adata store, cause the computer to apply queries to a data set stored bythe data store, by: receiving a query targeting the data set andspecifying a computation to be applied only to a portion of the data setthat matches at least one filter criterion; partitioning the query intoa filter portion that filters the data set into a filtered data subsetaccording to the at least one filter criterion, and a computationportion specifying at least one computation to be performed only on thefiltered data subset of the data store; according to the filter portionof the query, generating a filtering request to retrieve a filtered datasubset comprising a first portion of the data set satisfying the atleast one filter criterion and excluding a second portion of the dataset not satisfying the at least one filter criterion, according to theat least one filter criterion distinguishing the first portion from thesecond portion of the data set; sending the generated filtering requestto the data store to cause the data store to return all recordssatisfying the filter criterion and exclude all records not satisfyingthe filter criterion; and responsive to receiving the filtered datasubset from the data store, applying the computation portion of thequery only to the filtered data subset.
 18. The memory device of claim17, wherein: the query further comprises: a first filter criteriongenerating a first filtered data subset, and a second filter criteriongenerating a second filtered data subset using the first filtered datasubset; generating the request further comprises: generating a firstrequest specifying the first data subset filtered according to the firstfilter criterion; sending the request to the data store furthercomprises: sending the first request to the data store; and applying thequery further comprises: responsive to receiving from the data store thefirst filtered data subset in response to the first request: generatinga second request specifying the second data subset filtered according tothe second filter criterion and using the first filtered data subset;and sending the second request to the data store; and responsive toreceiving from the data store the second filtered data subset inresponse to the second request, applying the computation portion of thequery to the second filtered data subset.
 19. The memory device of claim17, wherein executing further the instructions further causes thecomputer to, before receiving the filtered data subset from the datastore: receive from the data store a filtered data subset size of thefiltered data subset; verify the filtered data subset size to generate afiltered data subset authorization; and responsive to generating afiltered data subset authorization, send the filtered data subsetauthorization to the data store.
 20. The memory device of claim 17,wherein: the data store further comprises an index that specifies, for afilter criterion value of the filter criterion, a data item setidentifying the data items of the data set having the filter criterionvalue for the filter criterion; and the filtering request furtherspecifies the filter criterion that is indexed by the index stored bythe data store.